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Abstract 

A scenario involving a source, a channel, and a destination, where the destination is interested in both 



j. reliably reconstructing the message transmitted by the source and estimating with a fidelity criterion the 

state of the channel, is considered. The source knows the channel statistics, but is oblivious to the actual 
channel state realization. Herein it is established that a distortion constraint for channel state estimation 
can be reduced to an additional cost constraint on the source input distribution, in the limit of large 
coding block length. A newly defined capacity-distortion function thus characterizes the fundamental 
tradeoff between transmission rate and state estimation distortion. It is also shown that non-coherent 

(N 

communication coupled with channel state estimation conditioned on treating the decoded message as 
t^J- ■ training symbols achieves the capacity-distortion function. Among the various examples considered, the 

capacity-distortion function for a memory less Rayleigh fading channel is characterized to within 1.443 
■^j- ' bits at high signal-to-noise ratio. The constrained channel coding approach is also extended to multiple 

o 

access channels, leading to a coupled cost constraint on the input distributions for the transmitting sources. 
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I. Introduction 

In this paper, we consider the problem of joint information transmission and channel state estimation 
over a channel with a random time-varying channel state. The objective is to have the destination recover 
both the message transmitted from the source and the states of the channel over which the message is 
transmitted, under the presumption that the random channel state is available to neither the source nor 
the destination, save its statistics. The problem setting is relevant for general situations where, besides 
communication, the destination needs to identify intrusions (as in secret communication systems [2]) or 
interference (as in dynamic spectrum access systems [3]), to monitor the environment (as in underwater 
acoustic/sonar applications [4]), and others. For example, in active sonar systems, a source can transmit a 
signal, which experiences Rayleigh fading that is a function of the reflected target [5]. Thus, the received 
signal at the receiver is of the form: 

Y = SX + Z 

where X is the transmitted signal, S is the information about the target as revealed in a Rayleigh fading 
channel and Z is channel noise. Extension of our results to the parallel channel enables the consideration 
of multistatic sonar [6]. 

In contrast to much prior work, we consider both information transmission and channel state estimation. 
In the literature, channel state estimation has long been studied with the goal of facilitating information 
transmission, versus as a separate goal in itself; see, e.g., [7]. The channel state estimation therein is 
for information transmission only, and does not compete for resources with the data transmission as we 
consider in this work. 

The problem formulation in [8], [9], [10] bears similarity to that we consider: the destination is 
interested in both information transmission and channel state estimation. However, a critical distinction 
that differentiates our work from this line of prior work, is the fact that in those works the channel state 
is assumed to be non-causally known at the source [8], [9], [10] and thus can be exploited for encoding 
the message. In our formulation, neither the source nor the destination has a priori knowledge of the 
channel state, except its statistics. Consequently, the solution for our problem and those of [8], [9], [10] 
are fundamentally different, as will be elaborated upon in the paper. However, since submission of our 
paper, a work [11] that unifies our scenario with that of [8], [9] has been presented; thus connecting the 
case of non-causal knowledge of state at the transmitter with the case of both the transmitter and the 
receiver being completely oblivious of channel state as we examine herein. 

Intuitively, an inherent tradeoff exists between a channel's capability to transfer information and 
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its capability to reveal state. Information transmission is accomplished by exercising random channel 
inputs, thereby increasing the randomness of the channel outputs and thus reducing the destination's 
ability to estimate channel states. Channel state estimation, in contrast, suggests that the source transmit 
deterministic channel inputs, limiting any information transmission through the channel. We quantitatively 
characterize such a fundamental tension in this paper. 

We show that the optimal transmission rate versus state estimation distortion can be formulated as 
a constrained channel coding problem, with the channel input distribution constrained by an average 
cost constraint wherein which we associate with each input symbol an "estimation cost". The problem 
of designing the optimal source then reduces to selecting codebooks which meet the estimation cost 
constraint, and the optimal tradeoff between transmission rate and state estimation distortion is charac- 
terized by a function termed the capacity-distortion function. Furthermore, we show that non-coherent 
communication coupled with channel state estimation conditioned on treating the decoded message as 
training symbols achieves the capacity-distortion function. We later extend the basic idea to two-user 
multiple access channels (MAC) with channel state estimation at the destination, and characterize the 
capacity region-distortion function for that scenario. The channel state estimation constraint again leads 
to an additional estimation cost constraint on the source distribution; however, this cost constraint is in 
contrast to conventional MAC in that, here, the estimation cost constraint is a coupled constraint for the 
two sources, as opposed to the separate input cost constraints such as an average power constraint at each 
of the sources. Thus, having specified the estimation cost constraint, the sources collaboratively optimize 
their input distributions, even when there are separate additional cost constraints. 

The rest of this paper is organized as follows. Section lljjdescribes the basic channel model with discrete 
alphabets, and formulates the problem of characterizing the capacity-distortion function. Section HiTl gives 
the capacity-distortion function, and establishes its achievability, through formulating the constrained 
channel coding problem. Section [TV] proves the converse part of the capacity-distortion function. Section 
IVl extends the results of the previous two, by considering an average cost constraint to the channel 
inputs in addition to the state estimation constraint. Section [VI] illustrates the application of the capacity- 
distortion function through several examples, including characterizing the capacity-distortion function for 
a memoryless Rayleigh fading channel within 1.443 bits at high signal-to-noise ratio (SNR). Section 
IVIII establishes the capacity-distortion region for two-user MAC with channel state estimation. Finally, 
Section IVIIII concludes the paper. 
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II. Basic Problem Formulation 

In this section, we formulate the joint information transmission and channel state estimation problem. 
For simplicity, we focus on channels with discrete alphabets. 

Message: An index m uniformly selected among M = {1,2,...,|M|}. 

Channel input: A symbol x taken from a finite input alphabet X = {a^\ aP\ . . . , a^ x ^}. 

Channel output: A symbol y taken from a finite output alphabet y = {b^\U 2 \ . . . , 

Channel state: A symbol s taken from a finite state alphabet S = {c^ , c^ 2 \ . . . , c^ s ^}. For each 
channel use, the state is a random variable S which has a probability mass function (PMF) Ps(s). Over 
any n consecutive channel uses, the channel state sequence S n is memoryless, P{s n ) = YVi=\ Psi^i)- 

Channel: A collection of probability transition matrices each of which specifies the conditional proba- 
bility distribution under a fixed channel state; that is, P(b^\a^\ c^) represents the probability of output 
y = G y occurring given input x = G X and state s = £ S, for any 1 < i < \X\, 1 < j < |V| 
and 1 < k < |S|. With n consecutive channel uses, the channel transitions are mutually independent, 
characterized by YYi=i P{Vi\ x ii s i) f° r output (yi, . . . , y n ) G occurring given input (x±, . . . , x n ) G X n 
and state (si, . . . , s n ) G S™. 

Distortion: For any two channel states, the distortion is a deterministic function, d : S x S >->■ IR + U{0}. 
It is further assumed that d(-, ■) is bounded, i.e., d(c^\c^) < D < oo for any 1 < i,j < |S|. For any 
two length-n state sequences (s\, . . . , s n ), (s[, . . . , s' n ) G S™, the average distortion is the average of the 
pairwise distortions, (1/n) Y17=i ^( s *' s 0- 

Coding framework: For each coding block length n, an (|M|,n)-code is described by the following 
components. 

• Encoder: A deterministic function, f n : M H> X n . Denote the codewords by x n (l), . . . ,x n (\M\), 
where x n (m) = f n {m) for each m. 

• Decoder: A deterministic function, g n : V ra >->■ M. 

• State estimator: A deterministic function, h n : V ra >->■ S n . We denote S n = h n {Y n ) as the estimated 
channel states. 

Probability of error for information transmission: We consider the average probability of error, which 
is defined as 

where Y n is induced by the channel input vector x n = f n (m) and the channel state vector S n according 
to the channel transition probability distributions. 
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Distortion for channel state estimation: We consider the average distortion, which is defined as 



_ 1 1 

1 1 m£M 



n 

i=l 



X n = f n (m) 



(2) 



where the expectation is over the conditional joint distribution of (S n ,Y n ) conditioned by the message 
m G M, noting that S n is determined by Y n . 

Achievable transmission-state estimation tradeoff: A pair (R,D), denoting a transmission rate and a 
state estimation distortion is said to be achievable if there exists a sequence of ( ^e nR ~\ , ra)-codes, indexed 
by n = 1,2, . . ., such that lim^oo pj™^ = 0, and limsup.^^^ d^ < D. 

Capacity-distortion function: For every D > 0, the capacity-distortion function C(D) is the supremum 
of rates R such that (R, D) is an achievable transmission-state estimation tradeoff. 

The central problem in this paper is to characterize C{D). 

III. The Capacity-Distortion Function and Proof of Achievability 

In this section, we present the capacity-distortion function and establish its achievability. 
To characterize C{D), we define the following minimal conditional distortion (or estimation cost) 
function for each channel input symbol x G X: 

d*(x)= min E[d(S,h(X,Y))\X = x] , (3) 

h:Xxy>->S 

where the function h : X x y i— > S is an one-shot estimator, and the expectation is over the conditional 
joint distribution of (S, Y) conditioned upon X = x, namely, 

Pr[5 = s,Y = y\X = x] 
= Pr[Y = y\X = x, S = s]Pr[S = s\X = x] 
= P(y\x,s)P s (s). 

Note that h maps a pair of channel input and channel output to a channel state. We denote the function 
h that attains d*(-) by h* (■,■). When there are more than one one-shot estimators that attain d*(x), an 
arbitrary one is selected. 

The capacity-distortion function is given by the following theorem. 

Theorem 1: The capacity-distortion function for the problem considered in Section HTl is 

C(D) = max I(X;Y), (4) 
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where 

y D = ) Px: J2 Px(x)d*(x) <d\. (5) 
Inspecting Theorem [TJ we see that d*(x) serves as an "estimation cost" due to signaling with x£l 

Hence CPd, as defined by (HJ), specifies an average cost constraint which regulates the input distribution so 

that the signaling is "estimation-efficient". Note that here the channel transition probability is marginalized 

over the channel state S, i.e., Pr[Y = y\X = x) = ^2 s£ § Ps(s)P(y\x, s). 

Before providing the proof of Theorem [T] we first summarize a few useful properties of C(D) in 

Corollary [T] 

Corollary 1: The capacity-distortion function C(D) in Theorem [T] has the following properties: 

1) C{D) is defined for all D > d m i n = min^gx d*[x). 

2) C{D) is a non-decreasing concave function of D for all D > d m { n . 

3) C(D) is a continuous function of D for all D > <i m i n . 

4) If d m - m is achieved by a unique x G X, C(d m i n ) = 0. 

5) C(D) = C(oo), for all D > d max = raax xe x d*(x), where C(oo) is the unconstrained channel 
capacity. 

Property 2) is established in the converse proof in Section [TV] and 3) is a direct consequence of Property 
2). Properties 1), 4), and 5) are straightforward and thus provided without proof. 
In the remainder of this section, we prove the achievability part of Theorem Q] 

Proof of achievability: The transmission part of the achievability proof closely follows the standard 
channel coding theorem. We fix a distribution Px G Td> and generate a ([e nii ] ,n)-code at random 
according to the constant composition of Px (see, e.g., [12]), for R < I(X;Y). The channel coding 
theorem for constant composition codes ensures that there exists a sequence of ([e nfi ] , n) -codes which 

(n) 

achieves linin^oo P e = 0. For each coding block length n, we can partition the output space y n into 
|M| = \e nR ] disjoint subsets T>f\ , . . . , D&L and decode the message index as g n (y n ) = m if the 
channel output y n belongs to ' . 

The next step in the achievability proof concerns the state estimation. After decoding, the destination 
then re-encodes the decoded message, to form 

X n = f n (g n (Y n )). (6) 

The state estimator chooses the state estimator h n to compute the channel state estimates according to 
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the following: 



S n = h* n (Y n )= h* (Xi , Fx) , . . . , h* (X n , Y n ) 



(7) 



Now consider the average distortion incurred by the state estimation procedure described above. For 
every coding block length n, for every m 6 M, the average distortion is given by 



dt ] = E 



1 - 

-}d(Si,Si 



8=1 



x" = /„M 



1 n 

£ Pr ^ n = ^ r " = y n \ xn = /»M] - E d(s i} 

0",jr)es"xy™ i=l 

£ Pr[5 n = S n ,^ n = y n |^ n = /n(m)]i^d(5 i ,5i) + 

(s",j/")gS"x2)^ ) « =1 

1 " 

( s «,r)es«x(y»\i»W) i=i 

1 n 

< E Pr t Sn = s " y " = y n \ X ' n = Urn)]- £ d(S t , Si) + et ] D 
(s™,j/-)eS' i xi)^ ) * =1 

1 n 

(s",j/")GS"xD^ ) « =1 

n 

< £ = y " = = Urn)]- h 

(s*»,2/™)es»x-y 

1 n 
n z — ' 

i=l 

where the expectation is over the distribution of (5 n ,y n ), and we define 

e^=Pr[g n (Y n )^m\X n = f n (m)]. 



[xAm 



UYi))+e$D 



i=l 



E 



Xi = Xi(m) 



+ e (n) D 



(8) 



(9) 



At this point, it directly follows, from the linearity of expectation and the definition of d*(-) in (O, that 

1 n 

J^-Vd'^H + e^. (10) 
n 

i=i 

As we average the per-message average distortions (fm over M following (0), the average distortion 
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1 1 mGM i=l 1 1 raeM 

n 

= h 1 EE^h+^ ) (id 

1 1 mGM i=l 

Recall that the codebook is generated according to a constant composition of input distribution Px- 
Therefore, by letting n — > oo, we have from (fTTb 

limsupd (n) < V P x (:c)(f (») < Z3, (12) 

where the last inequality is from the fact that Px belongs to To- Finally, by optimizing the possible 
input distribution Px over CPd, we establish the achievability of C{D). 

IV. Proof of Converse 

In this section, we prove that for every achievable rate-distortion pair (R,D), R < C(D) holds. 

Proof: For an arbitrarily chosen achievable rate-distortion pair (R, D), consider a ( ^e nR ~\ , n)-code that 
achieves it. Applying Fano's inequality as in the standard channel coding theorem [13, Ch. 7, Sec. 9], 
we have 

R < -I(X n ;Y n ) + PWR+ -. (13) 
n ' n 

Here the distribution of X n is induced by the uniformly selected message, and the distribution of Y n is 

correspondingly induced by X n and S n . Since the channel is memoryless, through standard bounding 

steps, R is upper bounded by 



n 

R<-Y / I(X i ;Y i ) + pWR+~. (14) 

i=i 

From the definition of C{D) in Theorem [T] ([141 further leads to 

R< - Vc Vp^Wd'W ] +pW J R+i. (15) 

1=1 \x£X / 

At this point, we note that C(D) is a non-decreasing and concave function of D. The non-decreasing 
property is clear because C r 5> D2 for arbitrary D\ < D2. To see the concavity property, denote the 
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input distribution that achieves C(Dj) (j = 1, 2) by P% . For any \i G (0, 1), time-sharing between P^ 

(2) 

(with a time fraction of fi) and PJ-- (with a time fraction of (1 — //)) hence leads to 

MC(-Di) + (1 - H)C{D 2 ) < max I(X;Y) 

= C^Dx + (1 - n)D 2 ). (16) 

So C(D) is a concave function of D. 

Utilizing the concavity of C(D), (|T3T > is further upper bounded by 

fi < C -EE^.^'W +p(»)R+l. (17) 

\ i=l x6X / 

In order to complete the converse proof, we need to show that for any sequence of (R, .D)-achievable 
codes and for all sufficiently large n, the following should be satisfied, 

1 n 

-^2^PxAx)d*(x)<D. (18) 

i=l xGX 



(19) 



If (1181 ) holds, then we can directly establish the converse since for all sufficiently large n, 



1 - 1 

R < - V C(D) + P^R + - -> C(D), 

i=l 



with lim n ^oo P e (n) = 0. 



Proof of fcUfy: We note that the empirical input distribution Px z (x) is induced by the uniformly selected 
message, i.e., X% = Xi(m) with probability 1/|M|, for every m G M. Hence the left hand side of (fT8l) 
can be rewritten as 



n n 1 

-EE^.(^w = -J2 Wl ^ d * {Xi{m)) 

i=l xgl 1=1 1 1 mgM 



1 

= ^EE^^-^i^^iH]' ( 2 °) 

I I m gM i=l 

where the expectation is with respect to Si and Y{ induced by Xi{m). We rewrite the average distortion 
definition © as 



1 n 



n|M| 

1 mgM i=l 



(21) 



which, for any arbitrarily small e > 0, for all sufficiently large n, has to be no greater than D + e due 
to the (R, D)-achievability requirement. Comparing (120} and (|2~T1 . it is thus sufficient to show that for 
each m G M and each 1 < i < n, in order to prove our desired result, 



B[d(Si, h*(Xi, Yi))\Xi = Xi(m)] < E d(Si, Si)\X n = f n (m) 



(22) 
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where the expectation on the right hand side is with respect to S n and Y n induced by the transmitted 
channel inputs x n (m). To further reduce the problem, we strengthen the channel state estimator by 
revealing (via a genie) Xi{m) to the estimator, when performing the state estimation. This way, the 
optimal state estimator for Si, denoted h* (•,•), solves 



min E 



d{SiMXi,Y r 



Xi = Xi(m) 



for every revealed Xi(m), m G M. In contrast, h*(-, •) solves 



min B[d(S i ,h(X i ,Y i ))\X i = x i (m)]. 
h-.Xxy^s 



(23) 



(24) 



Therefore, (|221 can be established by showing that 



min E[d(Si,h(Xi,Yi))\Xi = Xi(m))< min E 



d(SiMXi,Y n )) Xi = Xi (m) 



(25) 

In words, (1251 ) indicates that knowing the entire channel output sequence Y n does not lead to better 
estimation of Si versus only knowing Yj. In order to prove (l25l l. we need the following lemma. 

Lemma 1: For three arbitrary random variables U G 11, V G V, and W G W, where W is independent 
of (U, V), and for an arbitrary function d : U x U H> M, we have 



min B[d(U, f(V))} = min E [d(U,g(V,W))] . 
Proof of Lemma [7} Using the law of total expectation, we have 

E [d (U, g(V, W))\ = E [E [d (U, g(V, W)) \W\] , 



(26) 



(27) 



where the inner expectation is over (U, V) and the outer expectation is over W. Noting that W is 
independent of (U, V), we have 



mm E[d(U,g(V,W))} 

j:vxWh>U 



mm E\E[d(U,g(V,W))\W]] 

g-.VxWt^rU 



> E 



E 



min B[d(U,g(V,W))\W] 



min E [d(U, f(V))] 



= min E [d(U, fCV))] . 
On the other hand, since / : V h >• U is a special form of g : V x W i— > U, we have 



(28) 



min B[d(U, f(V))} > min E [d(U,g(V, W))} . 



(29) 



Combining ( |28T ) and d29l establishes Lemma [T] 
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Now let us apply Lemma[T]to (|25T ). We recognize Si as U, Yi as V, and (Y%, . . . , Yj-i, ii+i, • • • , 5^) 
as W, and note that with Xi(m) given, (Yi, . . . , li+i, • • • , 5^) is independent of (S^Yj) due to 
the memoryless property of the channel. Therefore Lemma [TJ indicates that d25l ) holds with equality. 
Consequently, (fT8T ) holds, thus concluding the converse proof of Theorem Q] 

V. Channels with Input Constraints 

In this section, we extend Theorem Q] to the scenario where besides the state estimation constraint, 
there also exists an average cost constraint on the channel inputs. 

For the basic problem formulation presented in Section |II] we introduce a cost function v(-) : X t- > 
M + U {0}, which associates each input letter with a certain nonnegative cost. For a given sequence of 
inputs (xi, . . . ,x n ) G X n , the resulting total input cost is v ( x i)- F° r an (|M|, n)-code, the average 

input cost is defined as 

n 

1 1 mSM i=l 

Subsequently, a tuple (R,D,V) can be used to describe a tradeoff between transmission rate, state 
estimation distortion, and input cost, which is achievable if there exists a sequence of ([e nR ] ,n)-codes, 
indexed by n = 1,2, . . ., such that lim^oo = 0, limsup n ^ OC) < D, and lim sup^^ < V. 
Therefore, the capacity-distortion-cost function C(D, V) is the supremum of rates R such that (R, D, V) 
is an achievable tradeoff. Frequently, it is customary to fix V, and consider the capacity-distortion function 
C(D) under that fixed V, as we will develop in some examples in Section IVT1 

Under such an average input constraint, Theorem [T]is extended to the form described by the following 
theorem. 

Theorem 2: The capacity-distortion-cost function is 



where 



C(D,V)= max I(X;Y), (31) 

^0 = | Px : p x(x)d*(x) <d\, (32) 

Tv = \Px'-^2Px{x)v(x)<Vy (33) 
Proof: The achievability of C(D, V) follows from that of C(D) as developed in the achievability proof 
of Theorem [T] combined with consideration of the average input cost constraint, cf. [14, Sec. 3.4]. 
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To establish the converse part, the argument is as follows. From Theorem [T] and the standard capacity- 
cost result [14, Chap. 3] respectively, we have that any achievable (R, D, V) should satisfy 

R< max I(X;Y), (34) 

and 

R< max I(X;Y). (35) 

Assume that these exists R > C(D,V) = maxp xe y D n?v I{X; Y) such that the tuple (R,D,V) is 
achievable. Then from (l34l and (|35T ). we have either Px $ 3V or Px $ ^D, which would in turn 
violate (1331 ) or (|34l , respectively. Therefore, no rate R > C(D,V) can be achievable, and the converse 
of Theorem [2] is established. 

VI. Examples 

In this section, we illustrate through examples the capacity-distortion function characterized in the 
previous sections. The first example examines a simple scenario where the estimation costs are uniform, 
and specifically shows that for a state-dependent Gaussian channel the capacity-distortion function behaves 
quite differently than that for the system with the state information at the transmitter. The second example 
evaluates the capacity-distortion function for certain binary multiplicative channels, and shows that the 
capacity-distortion function exceeds the tradeoff achieved by training. The third example considers a 
memoryless Rayleigh fading channel, characterizing its capacity-distortion function within 1.443 bits 
(i.e., one nat) at high SNR. 

A. Channels with Uniform Estimation Costs 

A special case is that d*(x), the estimation cost as defined in ©, is a constant do for all x G X. 
For this type of channels, the average cost constraint in © exhibits a singular behavior. If D < do, 
the joint transmission and state estimation problem is infeasible; otherwise, CPd consists of all possible 
input distributions, and thus the capacity-distortion function C(D) is equal to the unconstrained capacity 
of the channel. One of the simplest channels with uniform estimation costs is the additive channel 
Yi = Xi + Si + Zi, for which as the destination reliably decodes the message, it can subtract Xi from 
Yi so that the estimation of Si becomes independent of Xi. 
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We now briefly contrast our results for the capacity distortion function where the transmitter is oblivious 
to channel state with the work of [8], [9], [10] wherein the transmitter knows the channel state non- 
causally. Consider the state-dependent Gaussian channel: 

Yi = Xi + Si + Zi 

where Si ~ N(0, Q), Zi ~ N(0, N) and the transmitted signal has a power constraint of P. It is 
straightforward to show that the capacity distortion function C(D) = log ^1 + q+ n ^J for D > 
and zero otherwise for the mean-squared error distortion metric. In contrast, if the transmitter knows the 
channel state [9, Thm. 2], the system can achieve the following tradeoff: 

r < ibgfi+^y 

D > 



2 ° V N t 

jP + N 



for < 7 < 1. It is clear that channel knowledge at the transmitter enables both an increase in capacity 
as well as a reduction in distortion of channel state estimation. 



B. A Binary Multiplicative Channel 

We next consider an example that, while somewhat simple, facilitates drawing insights about the nature 
of joint transmission and state estimation and employs a distortion metric alternative to the mean-squared 
error. Consider the following, 

Y t = SiXi, (36) 

where X_ and Y_ are length-i^T blocks so that the super-symbols in the block memoryless channel have 
alphabets X K = ^ K = {0, \} K and the multiplication is in the common sense for real numbers. The 
channel state S £ § = {0, 1} remains fixed for each block, and changes in a memoryless fashion across 
blocks. We denote ~Pr[S = 1] = r < 1/2. We adopt the Hamming distance as the distortion measure: 
d(s, s) = 1 if and only if s ^ s and zero otherwise. We can view S as the status of a jamming source, a 
fading level, or the status of a primary transmitter in cognitive radio systems. Activating S to its "effective 
status" S = essentially shuts down the link between X and Y; otherwise, the link from X to Y is 
noiseless. The tradeoff between communication and channel estimation is straightforward to observe from 
the nature of the channel: for good estimation of S, we want x = 1 as often as possible, whereas this 
would reduce the achieved information rate. 
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For K > 2, there are 2 K possible vectors for an input super-symbol. All x except for the all-zero 
x = case lead to the same conditional distribution for Y_ as well as the same minimum conditional 
distortion d*(x) = 0. From the concavity of mutual information with respect to input distribution, the 
optimal input distribution should take the following form: 

Px(0) = l- P , andPx(x)=p/(2 K -1), Vx ^ 0. 



We can find that the channel mutual information per channel use is 

= ^ {H 2 (pr) + p ■ [r log(2^ - 1) - H 2 (r)] } 



(37) 



and that the average distortion constraint is (1 — p)r < D. The resulting solution for maximizing the 
mutual information is 

: f = 1, C (D) = rl0g( ?- 1} > 0. 



Case 1 : 2 K > 1 + (1 - 
Case 2 : 2 K < 1 + (1 - r) -1 / r : 



K 



if D> r 



1 + 



2 K -l 



,H 2 (r)/r 



> 



p 



1 + 



2^-1 



,H 2 (r)/r 



n -1 



, C(D) = - {H 2 (p*r)+p* [rlog(2^ - 1) - H 2 {r)] } ; 



otherwise 



P* = 1 - C(D) = 1 |fT 2 (r - £>) + (l - [r\og(2 K - 1) - iJ 2 (r)] } . 
Case 1 arises because if the channel block length K is sufficiently large such that 2 K > 1 + (1 



-l/r 



then the resulting p* as given by Case 2 would be greater than one, which is impossible. In Case 1, we 
have -Px(O) = 0, and all the nonzero symbols are selected with equal probability 1/(2 K — 1). In fact, 
Case 1 kicks in for rather small values of K. In our channel model with r G [0, 1/2], for r smaller than 
0.175, Case 1 arises for K > 2; and for all r larger than 0.175, Case 1 arises for K > 3. 

Numerical evaluation of C(D) reveals the trends described above. For relatively large D, the average 
distortion constraint is not active, and thus the optimal input distribution coincides with that for the 
unconstrained channel capacity. As the estimation distortion constraint D falls below a threshold, the 
average distortion constraint becomes active, and the capacity-distortion function C(D) decreases from 
the unconstrained channel capacity. 

For K = 1, we can show that as D — > 0, 



C{D) = l0g{l - r) D + o{D) 



(38) 
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which reflects a linear increase in capacity as we loosen the distortion requirement. For K > 1, we have 

= rlcgt^-l) > „ (39) 
K 

For comparison, let us consider a suboptimal approach based upon training; the source transmits X = 1 
in the first channel use in each channel block. The receiver can thus perfectly estimate the channel state 
S and achieve D = 0. The encoder then can use the remaining (K — 1) channel uses in each channel 
block to encode information, and the resulting achievable rate is 

^ 0) _r!s!£p. (40, 

Comparing C(0) and R(0), we notice that their ratio approaches one as K — > oo, consistent with the 
intuition that training usually leads to negligible rate loss for channels with long coherence blocks. But 
for small coherence blocks, the joint approach outperforms the training based approach. 

C. Memoryless Rayleigh Fading Channel 

Consider a discrete-time memoryless Rayleigh fading channel with scalar input and output, as 

Y = SX + Z, (41) 

where X G C is the channel input, and Y G C is the channel output. There is an average power constraint 
on X, as 

B[\X\ 2 } < p. (42) 

The fading coefficient S G C is the channel state to estimate, following a zero-mean unit-variance circular 
complex Gaussian distribution, CK(0, 1). The additive noise Z G C is also C3sf(0, 1). The distortion 
function is quadratic, i.e., 

d( s ,s) = \s-s\ 2 . (43) 

Therefore, the optimal one-shot estimator h*(x, y) is the minimum-mean-squared-error (MMSE) estima- 
tor, as 

h*(x,Y) = T1Y —Y, (44) 
\x\ z + 1 

and the resulting estimation cost d*(x) is the MMSE 

d*(x) = r*. (45) 
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So the capacity-distortion function C(D) is characterized by the following optimization: 

maxI(X)Y), (46) 

dPx 

s.t. / \x\ 2 dP x {x) < p, 

Jc 

Even without the channel state estimation constraint (that is, D > 1), neither the capacity nor the 
capacity-achieving input distribution of (l46l ) is fully known. It has been proved in [15] that the (power- 
constrained) channel capacity is achieved by a discrete input distribution with a finite number of mass 
points, including a mass point at X = 0. For the high-SNR regime, it is also known that the channel 
capacity grows double-logarithmically, i.e., C = O(loglogp) as p — > oo [16]. More precisely, it is 
established in [17] that for fairly general non-coherent fading channels, C = log log p + \ + as 
p — > oo, where x is a constant and is called the fading number. For the scalar memoryless Rayleigh 
fading channel considered here, the fading number x = — 1 — 7 where 7 = 0.5772... is Euler's constant. 

For the C(D) optimization problem (|46*1 ), we note that the two constraints have conflicting effects 
on the distribution of X. The average power constraint tends to "stretch" the support set of X toward 
zero because otherwise a certain amount of input power would be wasted; in contrast, the channel 
state estimation constraint tends to "push" the support set of X away from zero because otherwise the 
average distortion may violate the constraint. We focus on the high-SNR regime with p growing without 
bound, in which it is possible to simultaneously achieve large (increasing without bound as p — > 00) 
transmission rate and small (decreasing toward zero as p — > 00) estimation distortion. The following 
theorem characterizes some asymptotic behaviors of C(D) as p — > 00. 

Theorem 3: For the discrete-time memoryless Rayleigh fading channel (l4~TT ) with average power con- 
straint p and channel state estimation constraint D: 

1) If linip^oo Dp a = k, where < a < 1 and < k < 00 are both constants, for sufficiently large 
p, C{D) satisfies 

log log p + log (1 - a) - 1 - 7 < C(D) < log log p + log(l - a) - 7. (47) 

2) If linip^oo Dp = k < 00, then C(D) does not grow to infinity for all p. 

The proof of Theorem [3] is in Appendix, and is based on an induced additive-noise model for the 
memoryless Rayleigh fading channel introduced in [18]. 
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VII. Extension to Two-User MAC 

In this section, we extend the transmission versus state estimation problem to two-user MAC with 
state estimation, establishing its capacity-distortion region. We consider a two-user discrete memoryless 
MAC with state, whose channel transition probability distribution is described by P(y\xi,x 2 , s), where 
x\ G Xi and X2 £ X2 are the input alphabets for the first and second sources, respectively. The channel 
state S is a random variable with PMF Ps(s) over the state alphabet S. The channel output alphabet is 

y. 

We consider an (|Mi|, IM2I, n)-code, which consists of two encoders fi >n : Mi *-> X™ and f 2;Tl : 
M2 !->■ X2, a decoder g n : y n i->- Mi x M2, and a state estimator h n : y n h-> S n . Due to the presence of 
two sources, we define the average probability of decoding error as 

P^ n) = m L , E V*[9n{Y n ) + (m 1) m 2 )|Xf = /i,„(mi),X 2 n = / 2 ,n(m 2 )], (48) 

' X " 21 (m 1 ,m 2 )eM 1 xM 2 



and the average distortion of state estimation as 



= i V E 

|Mi||M 2 | ^ 



1 

-y^d(Si,Si 

i=l 



X{ l = /i,„(m 1 ),X 2 w = / 2)ri (m 2 ) 



(49) 



(mi,m 2 )eMi xMj 

We say that a tuple (Ri, R2, D) is achievable if there exists a sequence of (\e nRl ] , \e nR2 ] ,n)-codes, 
indexed by n = 1,2,..., such that lim n _ s . 0O = 0, and limsup n ^ 00 < D, and define the capacity- 
distortion region Q(D) as the closure of rate -pairs (i?i,i? 2 ) such that (Ri, R2, D) is an achievable 
transmission-state estimation tradeoff. 

Analogous to the single-user case, we define the minimal conditional distortion, or, estimation cost, 
for the two-user MAC as 

d*( Xl ,x 2 )= min E[d(S,h(X l ,X 2 ,Y))\X 1 =x 1 ,X 2 = x 2 }, (50) 

for (xx,x 2 ) £ %% x X 2 . 

Combining the proofs of Theorem Q] and the standard MAC coding theorem (see, e.g., [13, Thm. 
15.3.1]), we have the following theorem characterizing the capacity-distortion region. 

Theorem 4: For the two-user state-dependent MAC, its capacity-distortion region G(-D) is the union 
of all (R\,R 2 ) satisfying 

R 1 < I(X 1 ;Y\X 2 ,Q), (51) 

R 2 < I{X 2 -Y\X U Q), (52) 

#1 + ^2 < I(X 1 ,X 2 ;Y\Q), (53) 
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over product distributions PQ(q)Px 1 \Q{xi\q)Px 2 \Q{x 2 \q)P Y \x 1 ,x 2 (y\xi,x 2 ) on QxX x xX 2 xy satisfying 

£ P Q (q)P Xl \Q(xi\q)Px 2 \Q(x2\q)d*(x 1 ,x 2 ) < D. (54) 

({,ii,i2)£Qx Xi xX 2 

Here the cardinality |Q| < 5. 

Proof: The achievability part follows from the standard MAC capacity theorem [13, Sec. 15.3.1 and 
Thm. 15.3.4] based on random codebooks and typicality decoding, combined with the distortion bounding 
procedure in Section |III] using the asymptotically reliable inputs sequence (X'^X^) to estimate the 
channel states. 

To establish the converse, we begin by following the same bounding steps as in the converse proof of 
standard MAC capacity theorem (cf. [13, Sec. 15.3.4]). Considering any sequence of ( |~e nRl ] , [e ni?2 ] , n)- 
codes with lim„^oo pj = 0, the bounding procedure arrives at 

Ri < - V/(X lij; y 4 |X 2)4 ) + en, (55) 
n 

i=l 

R2 < -y j I(X 2 ^;Y l \X hl ) + e n , (56) 

i=l 

R1 + R2 < -Y j I{X l ^X 2/l ;Y i ) + t n , (57) 



n 

i=l 



where lim^oo e n = 0. 

Since the considered sequence of (["e"^ 1 ] , [e^ 2 ] ,n)-codes also needs to satisfy the state estimation 
distortion constraint, we have that the induced average distortion must not exceed D + e n , i.e., 
1 n 

d {n) = , M m . £ ^v[d(S i J i )\X? = f 1 , n (m 1 ),X2 = fr n (m 2 j\ < D + e n . (58) 

' 2 ' (m 1 ,m ! )eMiXM 2 i=l 

Using Lemma [T] as in the converse proof for the single-user case in Section |IVl we have from d58l ) that 
for any given sequence of ([e nRl ] , [e"^ 2 ] ,n)-codes, it is necessary to have 
1 n 

, M m j £ £ E [d(Si, h*(X lti , X 2 , i ,Y i ))\X l4 = xi A (mi),X 2}i = x 2 ^m 2 )] <D + e n , (59) 
for all sufficiently large n. That is (cf. d20l), 

£ £ P XlA {x 1 )P X2 .{x 2 )d*{x 1 ,x 2 ) <D + e n , (60) 
i=i x x eXi x 2 gx 2 

where we use the fact that the two encoders are independent. 

Now, as we let n grow without bound and introduce a uniform random variable Q over {1,2, ... ,n}, 
following the same argument as in [13, Sec. 15.3.4], the region of {R\,R 2 ,D) described by (l53Tl- (l57T ) 
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and (l60l) can be equivalently rewritten as 



Ri 


< 




Y\X 2 ,Q), 


(61) 


R 2 


< 


i(x 2 ; 


Y\X 1 ,Q), 


(62) 


R\ + i?2 


< 


I(Xi, 


X 2 ;Y\Q), 


(63) 


D 


> 


(q,X!,x 


p Q{l) p x x \Q(xi\q)Px 2 \Q(x2\q)d*{xx,x 2 ). 

2 )eQxiixi 2 


(64) 



To conclude the proof, we use Caratheodory's theorem to bound the cardinality of Q, as that for 
the standard MAC capacity theorem in [13, Sec. 15.3.3]. The region described by (l6"TTl-(l64l define a 
connected compact set in four dimensions, and hence we can restrict the cardinality of Q to at most 5 
in the capacity-distortion region. Theorem @] thus is established. 

VIII. Conclusions 

In this paper, we introduced a joint information transmission and channel state estimation problem 
for state-dependent channels, and characterized its fundamental tradeoff by formulating it as a channel 
coding problem with input distribution constrained by an average estimation cost constraint. Key to our 
problem formulation is the assumption that the transmitter is oblivious to the channel state information. 
The resulting capacity-distortion function permits a systematic investigation of the channel's capability 
for transmission and state estimation. We showed that non-coherent communication coupled with channel 
state estimation conditioned on treating the decoded message as training achieves the capacity-distortion 
function. We extended our results to multiple access channels, which leads to a coupled cost constraint 
on the input distributions for the transmitting sources. Future research topics include specializing the 
general framework to particular channel models in realistic applications, and generalizing the results to 
multiuser systems and channels with generally correlated state processes. 
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Appendix 

A. Proof of Theorem \3\ 

Following the development in [18, Sec. II], the mutual information I(X; Y) is equal to another mutual 
information I(U;T), where U and T are the channel input and output of an additive-noise channel 

T = U + W, (65) 

with U = (1/2) log (\X\ 2 + l) and T = log|Y|. The additive noise W is independent of U, and has 
PDF 

fw(w) = 2exp[2w — exp(2w)], w S (-00,00). (66) 



Accordingly, the two constraints in (1461 1 can be equivalently rewritten in terms of U, and the optimization 
problem becomes 

max I(U;T), (67) 

dPu 



s.t. 



poo 

/ e 2u dPu(u) < p + 1 
Jo 

poo 

/ e- 2u dP v {u) < D. 
Jo 

1) Lower Bound ofC(D): A lower bound of I(U;T) is given by [18, Eqn. (17)] 

I(U; T) > h(U) + log Vl + e- 2 i h (u)-h(W)] _ h ( W y (68) 
Consider a continuous distribution of U with the following PDF, 

Pu (u) = l/A for« G \u,u + A], (69) 
and zero otherwise. Furthermore, let both constraints in (I67T ) be active, namely, 



1 

A 



e 2u du = p+l, (70) 



1 ru+A 

- / e~ 2u du = D. (71) 
A Ju 

Such a uniform distribution of U thus leads to a lower bound on C(D) as 

C(D) > log A + log y/l + e -2[iogA-i+iog2- 7 ] _ ! + log 2 _ 7 , (72) 

where we note that h(W) = 1 — log 2 + 7 [18, Lem. 2. 1]. Hence in order to characterize the asymptotic 
behavior of the lower bound, we only need to investigate how A scales with p and D. To this end, we 
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solve (1701 ) and (1711 ) in parallel, to get 

e 2A = 2A 2 (p+l)D 



1 + 4/1 + 



1 



+ 1. 



(73) 



A 2 (p+1)D 

The right hand side of (1731 is a monotone increasing function of A 2 (p + 1)D over (0, oo), increasing 
from one to infinity. Consider the scaling of p — > oo and lim^oo Dp a = k, where < a < 1 and 
< k < oo are both constants. There are only two possibilities in such an asymptotic regime: A — > 
or A — > oo. If A — > 0, then from (TTOb it follows that e 2 - « (p + 1), which, when combined with (TTTb . 



leads to (p + 1)D « 1. But this is in contradiction with the assumption that « up 
only possibility is A — > oo, and from d73l we can further bound A through 

e 2A > 4A 2 ( p +l) J D > 4pL>, 

which leads to 



l-a 



oo. So the 



(74) 



=2A 



> 4L>p a 4k. 



Consequently, we have for sufficiently large p, 

log A > log log p + log(l — a) — log 2, 

which leads to 

C(£>) > log log p + log(l - a) - 1 - 7. 



(75) 



(76) 



(77) 



On the other hand, if linip^oo £>p = k (i.e., a = 1), then from d73l it is apparent that A does not 
grow without bound as p — > 00, and consequently the lower bound of C(D) is finite. 
2) Upper Bound of C(D): From the additive-noise channel model (|65T ). We have 



I(U; T) = h(T) - h(T\U) = h{T) - h{W) = h{T) - 1 + log 2 - 7. 



(78) 



Therefore, an upper bound of C(D) is obtained by upper bounding h(T). 

First, since e~ 2u is a convex function, from Jensen's inequality, the second constraint in (I67T ) leads to 



/■OO /"OO 

D > / e~ 2u dPu{u) > exp -2 / -udP^ 
Jo L 7o 

/■oo x 1 

i.e., y udPu(u) > - log—. 



This leads to a constraint on the expectation of the additive-noise channel output, as 

E[T] = E[C/]+E[W]> ilogl 



(79) 



(80) 
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where we note that E[W] = [18, Lem. 2.1]. Meanwhile, the additive-noise channel output satisfies 
another constraint as 

E[exp(2T)] =E[|y| 2 ] =p + l. (81) 
So we can upper bound h(T) by solving the following maximum-entropy problem: 

maxh(T) (82) 
s.t. E [T] =A> ilogi 

E[exp(2T)] =p + l. 



The solution to ([82]) is (cf. [18, App. A]) 



— exp(2t) 

p+ 1 



(83) 



where T(-) is the Gamma function defined as F(z) = f^°t z l e t dt for z G C with > 0, p > is 
determined by the equation 

log/i-^)=log(p + l)-2A (84) 

and ip(p) = j^\ogT{p) which is the Psi function (also known as the digamma function) [19]. Such a 
maximum-entropy PDF of p^(t) leads to an upper bound of C{D) as 

C{D)<\ogT{p)- l i^{p)+p- 1 -l. (85) 



To tighten the upper bound, we first notice that the right-hand side of ([85] ) is increasing with p > 0, so 
that the tightest upper bound is obtained when p is minimized. We then notice that the left-hand side of 
([84l is decreasing with p > 0, so that the minimum allowed p is attained when A = (1/2) log i, and 
we rewrite (l84l as 

logp-iP(p)=log(p + l)D. (86) 

Now, consider the scaling of p — > oo and Hindoo Dp a = k, where < a < 1 and < k < oo are 
both constants. The right-hand side of (l86l hence scales like (1 — a) log p + log k + o(l), and we need 
to enforce p — > as p — > oo. More precisely, by noting that ip{p) = —l/p — j + ir 2 p/6 — . . . for p w 0, 
we can write the left-hand side of ([86l as 1/p — \og(l/p) + 7 + o(l). Comparing the two sides yields 
1/p = (1 — a) log p + 0(1). On the other hand, as p — > 0, the upper bound of C(D) d85l ) scales like 

C(D)<log(l/A0-7 + °(l)- (87) 
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Substituting 1/p = (1 — a) log p + 0(1) into (|8"7T ). we finally reach 



C{D) < log log /j + log (1 - a) -7 + 0(1). 



(88) 



Finally, if lirn 



p— >oa 



Dp = k {i.e., a = 1), then (l86l ) approaches 



log/U - 1p{p) = \0gK 



(89) 



as p — > oo, whose solution p is finite and bounded away from zero. Consequently, the upper bound of 
C{D) is finite. 
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